Within ab initio Quantum Monte Carlo simulations, the leading numerical costfor large systems is the computation of the values of the Slater determinantsin the trial wavefunction. Each Monte Carlo step requires finding thedeterminant of a dense matrix. This is most commonly iteratively evaluatedusing a rank-1 Sherman-Morrison updating scheme to avoid repeated explicitcalculation of the inverse. The overall computational cost is thereforeformally cubic in the number of electrons or matrix size. To improve thenumerical efficiency of this procedure, we propose a novel multiple rankdelayed update scheme. This strategy enables probability evaluation withapplication of accepted moves to the matrices delayed until after apredetermined number of moves, K. The accepted events are then applied to thematrices en bloc with enhanced arithmetic intensity and computationalefficiency via matrix-matrix operations instead of matrix-vector operations.This procedure does not change the underlying Monte Carlo sampling or itsstatistical efficiency. For calculations on large systems and algorithms suchas diffusion Monte Carlo where the acceptance ratio is high, order of magnitudeimprovements in the update time can be obtained on both multi-core CPUs andGPUs.
展开▼